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We study the detailed growth of a social networking site with full temporal information by examining the creation process of each 
friendship relation that can collectively lead to the macroscopic properties of the network. We first study the reciprocal behavior of 
users, and find that link requests are quickly responded to and that the distribution of reciprocation intervals decays in an exponential 
form. The degrees of inviters/accepters are slightly negatively correlative with reciprocation time. In addition, the temporal feature 
i__ pf the online community shows that the distributions of intervals of user behaviors, such as sending or accepting link requests, 
follow a power law with a universal exponent, and peaks emerge for intervals of an integral day. We finally study the preferential 
^election and linking phenomena of the social networking site and find that, for the former, a linear preference holds for preferential 
CN sending and reception, and for the latter, a linear preference also holds for preferential acceptance, creation, and attachment. Based 
on the linearly preferential linking, we put forward an analyzable network model which can reproduce the degree distribution of 
(~| the network. The research framework presented in the paper could provide a potential insight into how the micro-motives of users 
Onjead to the global structure of online social networks. 
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1. Introduction 

>^ 

(— I At present the World Wide Web (WWW) is undergoing a 
Q_|landmark revolution from the traditional Web 1.0 to Web 2.0 
characterized by social collaborative technologies, such as so- 
I cial networking sites (SNSs), blogs, Wiki, and folksonomy [1]. 
As a fast growing business, many SNSs of different scopes and 
purposes have emerged in the Internet, many of which, such as 
MySpace [2, 3], Facebook [4-7], and Orkut [2, 8], are among 
the most popular sites on the Web according to Alexa.com. 
■Users of these sites, by establishing friendship relations with 
other users, can form online social networks (OSNs), which 
provide an online private space for individuals and tools for in- 
teracting with other people over the Internet. Both the popu- 
r* larity of these sites and availability of network data sets offer 
. !^ a unique opportunity to study the dynamics of OSNs at scale. 
It is beheved that having a proper understanding of how OSNs 
5h evolve can provide insights into the network structure, allow 
predictions of future growth, and enable exploration of human 
behaviors on networks [9-13]. 

Recently, the structure and evolution of OSNs have been ex- 
tensively investigated by scholars of diverse disciplines. Golder 
et al. studied the structural properties of Facebook and found 
that the tail of its degree distribution is a power law which is 
different from the traditional exponential distribution of real- 
life social networks [4]. However, a mean of 179.53 friends per 
user for Facebook [4] or a mean of 137.1 friends per user for 
MySpace [2] is close to Dunbar's number of 150, which is a 
limit on the number of manageable relations by human based 
on their neocortex size [14]. Holme et al. studied the structural 
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evolution of Pussokram and found that its degree correlation 
coefficient is always negative over time, i.e. disassortative mix- 
ing [15], which is in stark contrast to the significant assortative 
mixing for real-world social networks [16]. Viswanath et al. 
studied the structural evolution of the activity network of Face- 
book and found that the average degree, clustering coefficient, 
and average path length are all relatively stable over time [6]. 
Hu & Wang studied the evolution of Wealink [17, 18] and found 
that many network properties show obvious non-monotone fea- 
ture, including a sigmoid growth of network scale which was 
also observed by Chun et al. in Cyworld [19], and a transition 
from degree assortativity characteristic of real social networks 
to degree disassortativity characteristic of many OSNs which 
was also observed by Szell & Thurner in Pardus [20]. 

Despite the advancement, we find that to date most research 
on OSNs has focused on either the structural properties of a 
certain snapshot of networks or the multi-snapshots of evolving 
networks rather than detailed microscopic growth dynamics. 
For the research framework of network evolution from a macro- 
scopic viewpoint it is usually hard to reveal underlying mecha- 
nisms and growth processes governing the large-scale features 
of the observed network structure. In this paper, to gain better 
insight into the growth of networks, based on empirical data, 
we study the detailed process of people making friends in an 
OSN from a microscopic point of view. Instead of investigating 
the global network structure or structural metric evolution, we 
focus directly on the microscopic user behaviors per se, i.e., we 
study the properties of a sequence of the arrivals of each edge 
or the formations of each friend relation. 
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2. Data set 

In this paper, we will focus on Wealink, a large SNS in China 
whose users are mostly professionals, typically businessmen 
and office clerks [17, 18]. Each registered user has a profile, 
including his/her hst of friends. If we view the users as nodes 
V and friend relations as edges E, an undirected friendship net- 
work G(V, E) can be constructed from Wealink. For privacy 
reasons, the data, logged from 0:00:00 h on 11 May 2005 (the 
inception day for the Internet community) to 15:23:42 h on 22 
August 2007, include only each user's ID and list of friends, 
and the time of sending hnk invitations and accepting requests 
for each friend relation. 

The finial data format, as shown in Fig. 1, is a time-ordered 
list of triples <From, To, When>. For instance, <U\, U2, Ti> 
indicates that, at time T\, user U\ sends a Unk request to user 
U2, i.e., sends a friendship invitation to U2, while <U2, U\, 
T(,> indicates that, at Tg, U2 accepts U\s request and they be- 
come friends, i.e., a new edge connecting U2 and Ui appears in 
the OSN. Thus only when the sent invitations are accepted will 
the friend relations or network links be established. The online 
community is a dynamically evolving one with new users join- 
ing the connmunity and new cormections established between 
users. 

3. Reciprocal behavior 

Like most OSNs, in Wealink, a user invites another user to be 
his/her friend; if the invited user accepts the invitation, a friend 
relation is estabhshed between them and a new edge connecting 
them appears (see Fig. 1). Thus the friendship is constructed 
by bilateral agreement. The degree of a user, i.e., the number of 
friends, will appear on his/her profile, which can be browsed by 
all the other users. During our data collection period, 273 209 
sent link requests have been accepted and only 186 ones have 
not yet been accepted. Thus, in the following analysis, we will 
focus on the 273 209 sent link requests and their corresponding 
accepted ones with full temporal information. 

We first scrutinize the reciprocation of users, i.e., the send- 
ing of a link request from one user to another (as happens at 
Ti in Fig. 1) causes following acceptance of the request ijfi). 
Fig. 2(a) shows the complementary cumulative distribution 
of the intervals between sending and accepting link requests 
in Wealink. It is clear that users often quickly responded to 
link requests and reciprocated them. The interval distribution 
decays approximately exponentially. The least squares fitting 
gives Pc{f) ~ e " with = 0.958. In fact, as shown in 
Fig. 2(b), 67.04% of aU reciprocal behavior occurred within 
one day (24 hours) after the initial link requests and 84.25% of 
sent hnk requests were accepted within one month (30 days). 
Wealink informs users by email of new incoming link requests. 
It is quite possible that many users reciprocated requests as a 
matter of courtesy and respect. 

Recently inspired by the pioneering works of Barabasi et ai, 
there has been increasing interest for physicists and computer 
scientists in the research of human dynamics 1 2 1 , 22], which fo- 
cuses on the time interval distribution between two consecutive 



actions performed by individuals. The examples of such tem- 
poral statistics include the inter-event time distribution between 
two consecutive emails sent out by users, two consecutive vis- 
its to a web portal by users, and two consecutive library loans 
made by individuals. Empirical studies have shown that many 
distributions of inter-event time follow a power law. However, 
the exponential reciprocation interval distribution is in distinct 
contrast to the power law distribution of waiting time in emails 
(i.e., the time taken by users to reply to received emails). The 
importance of difl'erent emails is different. A reasonable hy- 
pothesis is that there can be correlation between the importance 
of emails and reputation/status of senders or "social closeness" 
to senders. Thus users can reply to received emails based on 
some perceived priority, and the timing of the replies will be 
heavy tailed. In contrast, there is no obvious priority for the 
reciprocal behavior of users in OSNs; thus an exponential dis- 
tribution wiU well characterize the reciprocation interval distri- 
bution. 

An interesting question is whether the users tended to recip- 
rocate incoming link requests quickly regardless of how many 
friends the inviters or accepters had. To answer the question, 
we study the correlation between reciprocation time and the de- 
grees of inviters/accepters at the time of sending Unk requests. 
Fig. 3(a) shows the density plot based on hexagonal binning for 
the relation between degrees of inviters k and reciprocation time 
t, where the cases with small k and t dominate. The Pearson cor- 
relation coefficient between k and f is -0.02, indicating slightly 
negative correlation. Fig. 3(b) shows the relation between k and 
mean reciprocation time (f) with logarithmic binning and error 
bars, (f) exhibits mild descending trend as k increases. Fig. 4 
shows the relation between degrees of accepters k and recipro- 
cation time /, which is similar to that shown in Fig. 3. The 
Pearson correlation coefficient between k and t is -0.05, also 
indicating mildly negative correlation. 

4. Types of users and edges 

The total number of users during the data collection period 
is N=223 482. Obviously the users can be divided into three 
classes: active users who sent link requests but have never re- 
ceived ones, passive users who received requests but have never 
sent ones, and mixed users who both sent and received requests. 
As shown in Tab. 1 , we find that most users belong to the former 
two classes. For the very popular SNSs, such as Facebook and 
MySpace, due to the high activity and degree values of users, 
most users could be mixed ones. However, Wealink is a very 
professional OSN with a mean degree of only 2.53. The activ- 
ity of most users is low; after joining in the OSN they either 
send link requests to a few old users (acquaintance in real Ufe 
very likely) or receive Unk invitations from several old users. 
Among the mixed users, there exists obvious positive correla- 
tion between the numbers of times of sending invitations and 
accepting invitations, and the Pearson correlation coefficient is 
0.48. As shown in Fig. 5, we find that the more link requests 
a user sends, the more requests he/she wiU receive, and vice 
versa. 
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<From, To, When> 



<Ui, U2, r,> Invite 
<Ui, U^, Tj> Invite 
<U-^, U^, 73 > Accept 
<U^, U2, r^> Invite 
<U2, U^, Accept 
<U2, Ui, Accept 
<U^, U2, T-,> Invite 




<U2, C/5, Tg > Accept 



Figure 1 : Data format and evolution of OSN Wealink. 




t (day) 



Figure 2: (a) The complementary cumulative distribution of time intervals between sending and accepting invitations. The solid line represents an exponential 
distribution fit. (b) The ratio of sent link requests which were accepted on the (th day after the initial invitations. 
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Figure 3: Relation between degrees of inviters k and reciprocation time I. (a) Density plot based on hexagonal binning, (b) Relation between k and mean reciprocation 
time (;> with logaiithmic binning. EiTor bars with ±1 standard deviation are also shown. 
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(b) 



Figure 4: Relation between degrees of accepters k and reciprocation time t. (a) Density plot based on hexagonal binning, 
reciprocation time (;) with logarithmic binning. Error bars with ±1 standard deviation are also shown. 



(b) Relation between k and mean 




Figure 5: Correlation between the numbers of times of sending invitations n\ and accepting invitations ni. (a) Density plot based on hexagonal binning, (b) Relation 
between ni and {711) with logarithmic binning, (c) Relation between 112 and (nj ) with logarithmic binning. Error bars with ±1 standard deviation are also shown. 
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Table 1: The numbers of users of different types. 



Type 


Active 


Mixed 


Passive 


Number 


128 589 


16 060 


78 833 


Percentage 


57.54% 


7.19% 


35.27% 



Table 2: The numbers of edges of different types. 



Type 


Old-Old 


Old-New 


New-Old 


New-New 


Number 


52 980 


82 740 


134 236 


3 253 


Percentage 


19.39% 


30.28% 


49.13% 


1.19% 



The finial network density is only 1.09 x 10"^, and what re- 
sults in the sparseness? As shown in Tab. 2, the £=273 209 
edges can be divided into four classes, and A-B type expresses 
that initially A users sent link requests to B users. "Old" means 
that the users have been in the network; they joined in the net- 
work some time ago and they either have sent at least one link 
request to other users or have received at least one link request 
from other users. "New" means that the users have joined in the 
network; however, they neither have sent link requests to other 
users nor have they received link requests from other users. It 
is shown that in Wealink most links are established by old users 
sending requests to new users (more than 30%) and new users 
sending requests to old users (approximately 50%). The num- 
ber of edges of Old-Old type is relatively small, leading to the 
sparseness of the network. 

5. Temporal characteristics of linking 

We study the time interval distribution between two link 
events. As shown in Fig. 6, the distributions of intervals be- 
tween consecutive sending link requests (i.e., between Ti and 
T2, T2 and T4, and so on in Fig. 1), accepting requests (i.e., 
between Tj, and Ts, Tj and Te, and so on in Fig. 1) and any two 
events (i.e. between T, and r,+i (/ > 1) in Fig. 1) all follow a 
power law with a universal exponent 1 .89, which diverges from 
the exponential distribution predicted by a traditional Poisson 
process and indicates bursts of rapidly occurring events sepa- 
rated by long periods of inactivity. Several peaks appear for 
intervals of an integral day in the tails of the distributions, indi- 
cating the daily periodicity corresponding to human life habits. 

6. Preferential selection 

Preferential selection means that, for a time-ordered list of 
individual appearance, the more likely an individual appeared 
before, the more possibly the individual will occur once again. 
We separate the preferential selection into two aspects: prefer- 
ential sending and preferential reception. Preferential sending 
describes the mechanism by which users send new Unk requests 
with probabiUty proportional to some power of the numbers of 
their sent link invitations before, and preferential reception de- 
scribes the mechanism by which users receive new link requests 



with probability proportional to some power of the numbers of 
their received link invitations before. 

Fig. 7 presents the schematic illustration of sending and re- 
ception sequences of OSNs. The former is a time-ordered list 
of users sending link invitations, and the latter is a time-ordered 
list of users receiving link invitations. In both sequences, the 
more frequently a user appeared before, the more likely the user 
wiU occur once again. 

Let kj be the number of sent or received link invitations for 
user i. The probability that user ( with frequency ki is chosen to 
send or receive a link request once again can be expressed as 
k^ 

\iki)=^. (1) 



Thus we can compute the probability fK^) that an old user 
of frequency k is chosen, and it is normalized by the number of 
users of frequency k that exist just before this step [23, 24]: 



Y„[e,^vAk,(t-l)^k] 
Y,,\{u:k,{t-l)^k]\ 



(2) 



where e, = v A kv(t - I) - k represents that at time t the old 
user whose frequency is k at time ? - 1 is chosen. We use [•] to 
denote a predicate (which takes a value of 1 if the expression 
is true, else 0). Generally, has significant fluctuations, 
particularly for large k. To reduce the noise level, instead of 
n(^). we study the cumulative function: 



K{k) = J Y[^k)Ak~kP^\ 



(3) 



Fig. 8 shows how the frequency k of users is related to the 
preference metric /c. j8 « 1 for both preferential sending and 
preferential reception, indicating linear preference. 

It is natural that, in the sending or reception sequence, the 
number of distinct users A'^ increases with sequence length T. 
Fig. 9 shows the growth pattern of with T for Wealink. 
N oc T, indicating that the appearance probability of new 
users is a constant, a = N/T. According to the Simon model 
[25], based on linear preferential selection and constant ap- 
pearance probabihty of new users, the complementary cumu- 
lative distributions of the numbers of sent invitations and re- 
ceived invitations for the users of Wealink follow a power law 
Pc(n) ~ n~^'i^\ Based on empirical data, for the inviters, we 
obtain a = 0.53 and Pdn) ~ n~'^-^^, and for the receivers. 



a = 0.35 and P^(n) 



-1.54 



Fig. 10 shows the distribution 



functions of the frequencies of inviters and receivers, and the 
tails of both distributions show power law behavior. The power 
law exponents achieve proper agreement with the predicted val- 
ues of the Simon model, 1/(1 - a). 

7. Preferential linking 

The degree distribution of Wealink shows power law features 
[17]. This kind of distribution can be produced, as indicated 
by the Barabasi- Albert (BA) model [26], through linear pref- 
erential attachment, where new users tend to attach to already 
popular old users. In Wealink, as shown in Fig. 1 , only when the 
sent link invitations are accepted can the inviters and receivers 
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Figure 6: Temporal characteristics of link request sending and acceptance. 



7'[> Invite 
Invite 



<From, To, When> 

<C/i, C/3 
<Uj, U^, 73 > Accept 



<U2, U„ 



T^> Invite 
T^> Accept 
T^> Accept 
T.j> Invite 
Tg> Accept 



<From, To, When> 
<U^, U2, r,> Invite 
<Ui, Uy Invite 
<U^, U2, Invite 
<Uy U2, T.j> Invite 



Sending sequence: (7,, f/j, U^, U^, 
Reception sequence; U2, Uj, Uj, 



Figure 7: A scliematic illustration of sending and reception sequences for Wealink. 




10° lo' lo' lo' lo' 



Figure 8: Testing preferential selection for users of sending and receiving invitations in Wealink. 
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O Reception 
+ Sending 




Figure 10: The complementary cumulative distributions of the numbers of sent and received invitations for users of Wealink. Both distributions have a power lavi' 
tail with slope -2.29 for sent invitations and -1.95 for received invitations. 
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become friends and new edges appear in the social network. 
When new users estabUsh friend relationship with old users, or 
new edges are established between old users, the old users with 
large degrees could be preferentially selected. 

To test the preference feature for different types of Unk estab- 
lishment, we separate the preferential linking into three aspects: 
preferential acceptance, creation, and attachment. Preferential 
acceptance impUes that, the larger an old user's degree is, the 
more likely he/she will accept link invitations from other old 
users. Preferential creation implies that, the larger an old user's 
degree is, the more likely his/her link invitations will be ac- 
cepted by the other old users. And preferential attachment im- 
plies that new users tend to attach to already popular old users 
with large degrees. 

For instance, in Fig. 1, at time Tf,, a new edge appears be- 
tween two old users Ui and 1/2- Old user U2 who accepts a 
link invitation can be chosen by preferential acceptance, and 
old user Ui who sends a link invitation can be chosen by pref- 
erential creation. At time Jg, a new edge appears between old 
user U2 and new user U5, and old user U2 can be chosen by 
preferential attachment. Fig. 1 1 shows the relation between the 
degree k of users and the preference metric k. We find thatyS ^ 1 
for preferential acceptance, creation, and attachment, indicating 
linear preference. 

The property of Unear preference for the network can be gen- 
eralized to general OSNs. Mislove et al. studied the evolution 
of Flickr; they defined preferential creation as a mechanism 
by which users create new Unks in proportion to their outde- 
gree, and preferential reception as a mechanism where users 
receive new links in proportion to their indegree [27]. They 
found that hnear preference holds for both cases, i.e. users 
tend to create and receive links in proportion to their outde- 
gree and indegree, respectively. Leskovec et al. studied the 
evolution of Flickr, del.icio.us, Yahoo.'Answers, and Linkedin, 
and examined whether the new users will preferentially link to 
the old users with large degrees [24]. They found that Flickr 
and del.icio.us show linear preference, Yl(k) ~ k, and Ya- 
hoo.'Answers shows slightly sublinear preference, H (k) ~ A:" '. 
Linkedin has a diff'erent pattern: for low degrees k, W (k) ~ k^-^, 
and thus the preference is not obvious; however, for large de- 
grees, \\{k) ~ fe'^, indicating superlinear preference, i.e., the 
edges to higher degree users are more sticky and high-degree 
users get super-preferential treatment. Even though there are 
minor difl'erences in the exponents yS for different networks, we 
can say that p ~ 1, implying that Unear preference may be uni- 
versal for OSNs. 

According to this linear preference, we put forward a realistic 
network model. Starting with a small network with otq nodes, 
at every time step, there are two alternatives. 

A. Growth and preferential attachment. With probability p, 
we add a new node with mi (< mo) edges that will be connected 
to the nodes already present in the network based on the prefer- 
ential attachment rule of the BA model, i.e., the probability 11 
that a new node will be cormected to old node i with degree A:, 
is n(Ai) = kijYjjkj. 

B. Preferential creation and acceptance. With probabihty 



q - 1 -p, we add W2 (wi + W2 < mo) new edges connecting the 
old nodes. The two endpoints of the edges are chosen according 
to linear preference ]~[ (kj) = kij"}]] kj. 

After / time steps, the model leads to a network with average 
number of nodes (A'^) = rriQ+pt. For sparse real- world networks, 
p > q. When p - I, the model is reduced to the traditional BA 
model. The model considers the introduction of new nodes and 
new edges, which can be estabUshed either between new nodes 
and old nodes or between old nodes. Most importantly, the 
model integrates linear preference for acceptance, creation and 
attachment found in the evolution process of real networks, and 
thus captures realistic features of network growth. 

The model has an analytic solution. Its stationary average 
degree distribution for large k is [28] 

3pmi +4qm2 

Pik) ~ k~ p"!*^! , (4) 

showing a scale-free feature. According to Tab. 2, we obtain 
P = Poid-New + PNew-oid = 0.7941 and q - 0.1939. The hnks 
created between two new users are few and thus can be negUgi- 
ble. In addition, mi = m2 = 1 for real growth of the network. 
Based on the parameters and Eq. (4), we obtain P(k) ~ k~^-^^. 
Fig. 12(a) shows the numerical result which is obtained by av- 
eraging over 10 independent realizations with p - 0.7941 and 
N=223 482. Its degree exponent 2.62 agrees well with the pre- 
dicted value of 2.67. Fig. 12(a) also presents the complemen- 
tary cumulative degree distribution of Wealink, and we find that 
the predicted value of the degree exponent 2.67 of the model 
achieves proper agreement with the real value 2.91. The dif- 
ference between real and theoretical values may arise from the 
fact that p and q are time-variant variables and not constants. 
Fig. 12(b) shows the evolution of p and q and demonstrates the 
fact. 

8. Summary and discussion 

To conclude, we have unveiled the detailed growth of an OSN 
from a microscopic perspective. Our study shows that the distri- 
bution of intervals between sending and accepting link requests 
decays approximatively exponentially, which is in obvious con- 
trast to the power law distribution of waiting time in emails, 
and there exists a shghtly negative correlation between recipro- 
cation time and degrees of inviters/accepters. The distributions 
of intervals of user behaviors, such as sending or accepting Unk 
requests, follow a power law with a universal exponent, indi- 
cating the bursty nature of the user act. We finally study the 
preference phenomena of the OSN and find that for preferential 
selection Unear preference holds for preferential sending and re- 
ception, and for preferential linking linear preference also holds 
for preferential acceptance, creation and attachment. We pro- 
pose a network model which captures real features of network 
growth and can reproduce the degree distribution of the OSN. 

It is noteworthy that, although there is a close relation be- 
tween the microscopic growth of networks and global network 
structure or structural metric evolution, it is still quite hard to 
bridge the gap between macro and micro perspectives of OSNs. 
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Figure 12: (a) The complementary cumulative degree distributions of Wealink and the model. Both distiibutions have a power law tail with slope -1.91 for Wealink 
and -1.62 for the model, (b) Evolution of the proportion of two Idnds of edge. The Old-New and New-Old types in Tab. 2 are integrated into the new-old type and 
the old-old type still corresponds to the Old-Old type in Tab. 2. 
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For instance, preferential linking may possibly supply some in- 
formation on the degree distribution of networks; however, it 
may not tell us much about the other properties of networks, 
such as clustering or community structure. Thus to gain an in- 
depth comprehension of OSNs, other microscopic behaviors of 
users, such as homophily, need to be studied in detail; a comple- 
mentary research framework integrating macro and micro per- 
spectives will also be indispensable. 
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